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Abstract — We present a test for the problem of decentralized 
sequential hypothesis testing, which is asymptotically optimum. 
By selecting a suitable sampling mechanism at each sensor, com- 
munication between sensors and fusion center is asynchronous 
and limited to 1-bit data. The proposed SPRT-like test turns out 
to be order-2 asymptotically optimum in the case of continuous 
time and continuous path signals, while in discrete time this 
strong asymptotic optimality property is preserved under proper 
conditions. If these conditions do not hold, then we can show 
optimality of order-1. Simulations corroborate the excellent 
performance characteristics of the test of interest. 

Index Terms — Sequential hypothesis testing, SPRT, Decentral- 
ized detection. 



I. Introduction 

SEQUENTIAL hypothesis testing, first introduced by Wald 
OQ, is one of the most classical and well-studied problems 
of sequential analysis with applications in areas such as indus- 
trial quality control, signal detection, design of clinical trials, 
etc 121, 0. In the last two decades, there has been an intense 
interest in the decentralized (or distributed) formulation of 
the problem Pll- lfT3l . In this setup, the sequentially acquired 
information for decision making is distributed across a number 
of sensors and is transmitted to a global decision maker (fusion 
center), which is responsible for making the final decision. 

The main difference in the decentralized version of the 
problem is that the sensors are required to quantize their obser- 
vations before transmitting them to the fusion center; in other 
words, the sensors must send to the fusion center messages that 
belong to a. finite alphabet [4|. This requirement is imposed 
by the need for data compression, smaller communication 
bandwidth and robustness of the sensor network, which are 
crucial issues in application areas such as signal processing, 
mobile and wireless communication, multisensor data fusion, 
internet security, robot networks and others Q. 

Depending on the local memory that the sensors possess 
and whether there exists feedback from the fusion center, 
Veeravalli et. al. J6l proposed five different configurations for 
the sensor network. In the same work, the authors found the 
optimal decentralized test -under a Bayesian setting- in the 
case of full feedback and local memory restricted to past 
decisions. Moreover, under a Bayesian setting, the case of 
no feedback and no local memory was treated in Q while 
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the case of full local memory with no feedback in |8),|9). 
However, in the last two cases no exactly optimal decentralized 
test has been discovered (see [10] for a review). 

In this work, we assume that the alphabet consists of two 
letters for all sensors, i.e. we allow the communication of only 
1-bit messages. Moreover, we do not use any feedback and 
we consider the configuration of partial local memory IfTTII . 
Specifically, we assume that at each time instant each sensor 
has access to the value of a summary statistic -that summarizes 
its previous observations- and uses this value, together with 
its current observation, in order to send a quantized signal 
to the fusion center. Under this configuration, an (order-1) 
asymptotically optimal scheme was suggested by Mei 1 1 1 ] 
under a Bayesian setting. 

Most schemes in the literature of decentralized detection 
require synchronous communication of the sensors with the fu- 
sion center. However, forcing distant sensors to communicate 
with the fusion center concurrently can be a very challenging 
practice. Thus, it is important to develop and analyze schemes 
where this communication protocol is asynchronous. Examples 
of asynchronous schemes can be found in |12| and [13|. 

Taking into account this consideration, we suggest that the 
sensors communicate with the fusion center asynchronously 
but also at random times. In particular, we suggest that 
the times instants at which sensor i communicates with the 
fusion center be stopping times that depend on the observed 
information at sensor i. We call this type of sampling adapted. 

A special case of adapted sampling is the Lebesgue (or level- 
triggered) sampling which induces, naturally, a 1-bit com- 
munication between sensors and the fusion center. Lebesgue 
sampling combined with a Sequential Probability Ratio Test 
at the fusion center give rise to a detection structure known 
as Decentralized Sequential Probability Ratio Test (D-SPRT) 
introduced by Hussain in lfl2l . in a discrete time context. 
However, Hussain did not provide any theoretical support for 
this test nor evidence that it is efficient in any sense. 

Our main contribution in this work consists in formulat- 
ing and providing proof of asymptotic optimality of the D- 
SPRT, under both the discrete and the continuous time setup. 
Our asymptotic optimality result turns out to be stronger as 
compared to the scheme proposed in ifTTI . with simulation 
experiments corroborating our theoretical findings. 

The case of continuous time observations, which we analyze 
in Section IV, is clearly an idealization, since in practice we 
cannot record the sensor observations continuously. However, 
studying the problem under such a setup allows us to isolate 
the loss in efficiency due to discrete sampling of the underlying 
processes at the sensors. This provides valuable insight that 
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leads to more efficient sampling schemes in the more realistic 
case of discrete time observations. 

This paper is organized as follows: Section I contains the 
Introduction. In Section II, we formulate the sequential hy- 
pothesis testing problem for the discrete and continuous time 
case under a centralized and decentralized setup. Moreover, we 
introduce the concept of adapted sampling and emphasize on 
Lebegsue sampling and the D-SPRT. In Section III we recall 
the main optimality results for the centralized formulation 
since these tests serve as a point of reference for their 
decentralized counterparts. Section IV presents the asymptotic 
optimality properties of D-SPRT in the context of continuous 
time and continuous path observations while in Section V we 
develop the same results, at the expense of a more involved 
analysis, for the discrete time case. In this section we also 
examine the notion of oversampling that "reconciles" the 
behavior of the discrete time D-SPRT with its continuous 
time version and provides some important design observations. 
Finally, in Section VI we conclude our work. 

II. Centralized versus Decentralized Sequential 
Testing 

Suppose that we have a sensor network consisting of K sen- 
sors as depicted in Fig.Q] Each sensor i observes sequentially a 
realization of a stochastic process {£|}t>o with distribution P\ 
We assume that the processes . . . , {Ct^} are independent 
and we denote by { jF t l }t>o the filtration generated by {£( }*>o, 
where J£"q = {0, ft}. We also denote with P the probability 
measure of {(£*, . . . ,^)}t>o and by {^t}t>o the filtration 
generated by this vector process. From the assumption of 
independence across sensors, we have: P = P 1 x...xP K . 

Consider now the following two hypotheses for the proba- 
bility measure P: 

H :P = P ; H 1 :P = Pi, (1) 

where P.,- = P) x ... x Pf,j = 0, 1, and P}, j = 0,1; i = 
1, . . . , K are known probability measures. Thus Ho, Hi are two 
simple hypotheses. For simplicity we also assume that each 
pair P , P\ contains mutually absolutely continuous measures, 
therefore we can define the "local" log-likelihood ratio process 
at each sensor i and for each time instant t, as follows 




Fig. 1. Schematic representation of a decentralized sensor network 



Moreover, due to the independence of observations across 
sensors, we can write the "global" log-likelihood ratio {u t } 
in the sensor network as the sum of its local components, i.e. 

dP K 
^=log^(jr t )=VV, 0<t<oo. (3) 

Although the sensors observe sequentially the processes 
{£t}t>0i they are allowed to communicate information to the 
fusion center only at a sequence of discrete times. In particular, 
we assume that the fusion center receives sequentially from 
each sensor i the data {z^} at a strictly increasing sequence of 
time instants {r^jngN- Each T l n is an {J^j-adapted stopping 
time with = and Pj(r^ < oo) = 1, Vn G N, j = 0, 1 and 
i = 1, . . . ,K. We call this communication scheme adapted 
sampling and we refer to the stopping times {r^} as the sam- 
pling times in sensor i. Each z' l n constitutes a summary of the 
acquired information up to time T % n and, as we mentioned 
in the Introduction, it takes values in a finite alphabet. Here 
we are going to assume that this set is binary. We should also 
emphasize that we do not consider any feedback scheme from 
the fusion center towards the sensors. 

Adapted sampling clearly implies asynchronous communi- 
cation between the sensors and the fusion center at random 
time instants. Thus, the number of samples sent from sensor 
i to the fusion center up to any time instant t is random and 
in general different for each sensor. We should mention that 
adapted sampling is a general framework that can incorporate 
various sampling mechanisms already used in the literature, 
in particular: 

« When T % n — T % n _ x = h,Vn £ N, adapted sampling 
reduces to canonical deterministic sampling with constant 
sampling period h > 0, common to all sensors. 

• When {r r * — T^_ 1 } ne j^ is a sequence of i.i.d. random 
variables, independent of the observation process 
adapted sampling becomes independent random sam- 
pling. For example, if the intersampling periods {t£ — 
r n-i}nen are independent and exponentially distributed 
with the same mean, we recover the sampling scheme 
suggested in lfl3l . 

• When the sampling times depend on the observed se- 
quence and are given by the following recursion 

ri = inf {t > <_! : u\ - $ (-A, , 3,)}, (4) 

where , A, > are proper thresholds, then we call the 
resulting scheme Lebesgue (or level-triggered) sampling. 

Although not evident at first, we should emphasize that the 
fusion center is the recipient not only of the data sequences 
{z^} but also of the sampling times {t^} that may carry in- 
formation which is relevant to the hypothesis testing problem. 
Consequently, for each sensor i, let us define the sequence of 
intersampling periods {(5* l }n>o where S' l n = T l n — t„_ 1 . 

In parallel to the communication activity the fusion center, 
at each time instant t, uses all the received data up to time t, in 
order to make a decision whether to continue or stop receiving 
additional data. In the latter case it proceeds to make a final 
decision between the two hypotheses. 
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Under a decentralized setup, denote with m\ the number 
of pairs (z^,^) received by the fusion center from sensor i 
up to (and including) time t. We can now define the filtration 
{^t}t>o for the fusion center where Sf t = a{(z l n , 5 z n ), n < 
ml; i — 1,...,K} is the er-algebra generated by all pairs 
(z l n , 8 l n ) received up to time t. The fusion center based on this 
time increasing information can use an {^}-adapted stopping 
time T to decide about stopping or continuing sampling. After 
stopping it also uses an ^-measurable decision function dx 6 
{0, 1} to select one of the two hypotheses. 

Under the centralized setup the fusion center gains access 
to the entire information acquired by the sensors up to time t. 
Consequently, if {^t}t>o is the corresponding filtration with 
&t = °"{£i>0 < s < *> i = h •■•,%} denoting the a- 
algebra generated by all acquired information up to time t then, 
the fusion center can use an {J^}-adapted stopping time T 
and an J^r -measurable decision function dr 6 {0, 1} to stop 
sampling and provide a decision between the two hypotheses. 

Under both, the centralized and the decentralized formu- 
lation, our intention is to define the pair (T,dr) optimally. 
Following Wald (TJ, for any a, f3 > 0, we define the class 
of sequential tests for which the Type-I and Type-II error 
probabilities are below the two levels a, (3 respectively, that 
is, 

<g a 3 = {{T,d T ) : P (d T = 1) < a and P x (d T = 0) < /?}. 

(5) 

We can now define the following constrained optimization 
problem. 

Problem 1: Given a, (3 > such that a + f3 < 1, find a 
sequential test (&,dy) € ffa a so mat 



inf 

(T,d T )eV a , 



EiPl, 3 =0,1. 



(6) 



If we seek the test among the { jF t }-adapted schemes we 
refer to the optimum centralized version whereas if we limit 
ourselves to {£f f }-adapted tests then we obtain the optimum 
decentralized procedure. Note that we attempt to find a single 
test that simultaneously minimizes two different criteria (the 
expected decision delay under the two hypotheses). It was 
Wald's remarkable insight that led first to conjecture [ 1 1 and 
then prove [ 14| that a test with such extraordinary optimality 
property indeed exists. 

Let us also introduce a second problem, proposed by 
Liptser and Shiryaev 1151 . which constitutes a slight variant 
of Problem 1 . 

Problem 2: Given a, (3 > such that a + (3 < 1, find a 
sequential test (iT, d&) € ^ Q ,/3, so that 



-E [ujr] = inf (-Eo[«t]), 
Ei[u^] = inf Ei[u T ]. 



(7) 



Recalling that {u t } is the running log-likelihood ratio of the 
two probability measures, it is clear that the two expectations 
Ei[u t ] and — ErjjuJ give rise to nonnegative and increasing 
functions of time. These two time functions constitute, in 
Information Theory, a popular divergence measure known 
as the Kullback-Leibler (K-L) divergence. This interesting 



information theoretic criterion reduces to the usual average 
detection delay when the signals are i.i.d. (in discrete time) or 
Brownian motions with constant drift (in continuous time). 

It is clear that any decentralized scheme is bound to be 
inferior in performance to the optimum centralized test. This 
is true for two major reasons. First because a decentralized 
test has access to less information ({z l n } being a summary of 
{££}) but also because of loss in time resolution ({r^} being 
a sampled version of the actual time t). The main goal of 
our current work is to find decentralized schemes where this 
performance loss can be quantified and propose methods for 
controlling it. 

Regarding the decentralized version of Problem 1 and 2 we 
must emphasize that the way it is stated, it is assumed that 
the sampling/quantization policy, namely the mechanism by 
which the pairs {{z\, 8 l n )} are generated from the observation 
sequence {Q}, is already specified. Of course one might 
extend both problems by including an additional minimization 
over the sampling/quantization policy as well, thus optimizing 
all parts of the decentralized test. Finding however optimum, 
per se, decentralized tests that solve the extended version of 
the two problems turns out to be an extremely challenging 
task. For this reason we focus on suboptimum procedures. 

To assess the quality of any decentralized test, since the 
optimum decentralized test is not available, we can compare 
it against the centralized optimum scheme which is known 
in several important cases. We are in particular interested 
in asymptotically optimum tests. If £T denotes the stopping 
time corresponding to the optimum centralized test that solves 
Problem 1 or 2 and T the stopping time of a decentralized (or 
even centralized) competitor, then we distinguish the following 
degrees of asymptotic optimalitjQ: 

We will say that a test is asymptotically optimal of order-1, 
if for j — 0, 1 and as a,(3—> 0, we have 



Ej[T] 

E,m 



l + o(l), or 



E 3 >r] 
Ej[us] 



= l + o(l), 



(8) 



for Problems 1 and 2 respectively. 

We will say that a test is asymptotically optimal of order-2, 
if for j = 0, 1 and as a, /3 — > 0, we have 

E,[T] - Ejl&l = 0(1), or E 3 [u T ] - E^uy] = 0(1), (9) 

for Problems 1 and 2 respectively. 

Finally, even though we will not consider this form of 
asymptotic optimality here, we define a test to be asymptoti- 
cally optimal of order-3, if for j = 0, 1 and as a, /3 — > 0, we 
have 



E,m 



Ej[&] = o(l), or E> T ] - Efcv] = o(l). (10) 



It is clear that order-3 optimality is stronger than order-2 
which is stronger than order- 1 . Indeed order-2 implies order- 1 
because expected delays and K-L divergences increase without 
bound as a, (3 — ► 0. 

'We recall the difference between the notations O(-), O(-) and o(-). If w 
is a parameter that tends to or oo and A{u>),B{uj) functions of ui then 
A(u)) = @(B(ui)) means that |./4(a>)|/|B((j)| is uniformly bounded away 
from and oo; A(lu) = 0(B(uj)) that the same ratio is bounded away from 
oo and A(u>) = o(B{u))) that A(u>)\/\B(u))\ — * as uj tends to or oo. 
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& = inf{< > : u t (j, {-A, B)} , d s = 



In order to establish any form of asymptotic optimality, it 
is evident from the previous definitions that we need to recall 
the major results of the optimum centralized theory. 

III. Optimum Centralized Sequential Testing 

The optimization problems defined in (O and (|7]i are associ- 
ated with the well celebrated Sequential Probability Ratio Test 
(SPRT) proposed by Wald [1|, which is defined as follows 

1 if usr > B 
if usr < -A, 
(11) 

where A, B > are two thresholds and 2? is the first time 
the global log-likelihood ratio process {u t } leaves the open 
interval (—A, B). The decision function dgr on the other hand 
is an &g- -measurable random variable, according to which Ho 
(Hi) is accepted if the lower (upper) threshold is first crossed. 
The two thresholds A, B are selected so that the two error 
probability constraints in (0 are satisfied with equality. 

In continuous time Shiryaev lfl6l considered the following 
hypothesis testing problem 



Ho:S 

= foV 



w f 



Hi :& = St + w l t , 



(12) 



where {wt — {w}, . . . , )}t>o is a if-dimensional Wiener 
process and /i = (/i 1 , . . . , n K ) G R K are constant drifts. The 
local log-likelihood ratio is equal to u\ = — 0.5(//) 2 i + 
and by summing the local components we can compute u t and 
apply the SPRT which is optimum in the sense of Problem 1 
and Problem 2. 

In the Brownian motion case, we have also exact formulas 
for the optimum performance. Specifically 

E [J?} = ^ m H(a,/3); E 1 [^] = 1 ^ m n(p,a), (13) 

HmII w\\ 

where H(x,y) — £log(y^j) + (1 — x)log(i =£ ). The two 
thresholds that guarantee that the two error probability con- 
straints are satisfied with equality are given by 

l-a\ „ , (\-$ 



A = log 



(3 



B = log 



(14) 



A significantly richer class of hypothesis testing problems 
was proposed by Liptser and Shiryaev |fT31 that involves Ito 
processes. In particular 



Ho : Ct = u 

where, as before {w t = (w. 
Wiener process and {/j, t 



Hi = / M> 



(15) 



)} 



t>0 



ilA,- 



dimensional {J^t}-adapted process satisfyin 



is a i\~-dimensional 
a K- 



K 



t>0 



Us 



' ds = 



' ds < oo 



= 1, 



(16) 



E 



exp / \\fi s \\ ds 



o 



< oo, 



2 The last condition in H6\ is known as the Novikov condition and assures 
that {e ut } is a martingale. Alternative, more relaxed conditions that guarantee 
the martingale property can be found in QT] Page 199]. 



for all t > 0, j = 0, 1. The local log-likelihood ratio u\ takes 
the form 

u\ = - [ 0.5(/4) 2 ds+ / /4<, (17) 
Jo Jo 

which again allows for the computation of u t and the appli- 
cation of SPRT. It is also interesting to mention that, in this 
particular case, the K-L divergence can be equivalently written 



-EqM = E 
Ei [ttt] = Ei 



0.5||/i s || 2 ds 



0.5\\t*sfds 



(18) 



which clearly reveals the nonnegative and time increasing 
nature of this alternative criterion. As proven in lfT31 . under 
this more general setup, SPRT is optimum in the sense defined 
by Problem 2 delivering the following optimal performance 

-E o [u?}=n(a,0); E 1 [u s ]=n(P,a), (19) 

with the thresholds A, B defined according to (fl4l i. for the 
two constraints in (0 to be satisfied with equality. 

In discrete time, SPRT is known to be optimum in the sense 
of Problem 1 and Problem 2 when the vector sequence {£ t } 
with £t = . . . , £j ) is i.i.d. with independent components 
under both hypotheses. In particular under the two hypotheses 
we have 

K 

Ho: Zt~F {e,...,Z K ) = J[F*((*) 

^ (20) 
Hi: £t~F 1 (e,...,Z K ) = l[Fi(e), 



where Fj(x) denotes the cdf of the data acquired by sensor 
i when hypothesis Hj is true and "~" means "distributed 
according to". For this case the local log-likelihood ratio takes 
the form 

dFKti) 



S l0S ^(a)' 



and by summing over i we can compute the global log- 
likelihood ratio and apply the SPRT. The proof of optimality of 
SPRT was first offered by Wald and Wolfowitz in lfl4l . In fact 
this proof constitutes the first optimality result of Sequential 
Analysis. We can now make the following remarks: 

• The SPRT has also been proven to be optimal in the case 
where the are independent homogeneous Poisson 
processes [18]. This problem however is not particularly 
interesting under the decentralized setup since an arrival 
at a sensor can be signaled to the fusion center using 
simply one bit of information. 

• In discrete time, SPRT is known to be optimum only in 
the i.i.d. case. Unfortunately no analog to the Ito class 
result for Problem 2 has been developed so far. 

From the optimum centralized theory we conclude that in 
order to apply the SPRT we need the global log-likelihood 
ratio {u t } or more precisely its local components {u l t } coming 
from the sensors. Our goal in the next sections will be to 
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propose efficient approximations for these processes that will 
replace them in the definition of SPRT thus giving rise to 
an SPRT-like test. The efficiency of this test will then be 
compared against the optimum SPRT in order to assess its 
asymptotic optimality. 

IV. Decentralized Sequential Testing in 
Continuous Time 

Since we are in the continuous time case, t is real taking 
values in [0, oo). Let us assume, but without for the moment 
explaining how, that the fusion center is capable of reproduc- 
ing exactly the local log-likelihood ratio u\ at the sampling 
instants t = r*, by using only the received information {z^} 
from sensor i. It then makes sense to approximate u\ between 
sampling times with its most recently reproduced value. In 
order to write this more formally, we recall that m\ denotes the 
number of samples transmitted by sensor i up to time t. Thus, 
at time t, t % ; is the most recent sampling time and u % , the 

m t T i 

most recently reproduced log-likelihood value. Our suggestion 
is to approximate u t with u t — u % ri . We emphasize that we 

have exact equality between u \ and u\ at t — r l n , because 
we assume that the fusion center is capable of reproducing 
exactly the corresponding log-likelihood ratio at the sampling 
times {<}. 

Then, the fusion center can produce an approximation u t 
for the global log-likelihood ratio u t by summing the available 
local approximations 



K 



K 



< t < oo, 



(21) 



Unlike the local approximation u\ which is exact at t = T l n , 
the global approximation u t can be exactly equal to u t at a 
sampling instant only if all sensors transmit synchronously, 
otherwise u t and u t will be different. 

Replacing now {ut} with {u t } in the definition of SPRT in 
(fTTT l. we obtain an SPRT-like test of the form 

<r = w£{t>0:utt(-A,B)}, = { J 

where again the thresholds A,B>0 are selected to satisfy 
the error probability constraints with equality. The test we 
just described constitutes the fusion center policy we propose 
under the decentralized setup. Let us now explain how the 
fusion center can make an exact reproduction of the local log- 
likelihood ratios. 



A. Lebesgue Sampling as a Quantization Strategy 

Of course the simplest way the fusion center can reproduce 
the log-likelihood ratio, is by receiving the corresponding 
value directly from the sensor. However this would require 
a communication protocol that is not limited to 1-bit informa- 
tion. The interesting point is that, after careful consideration, 
the 1-bit communication constraint can be satisfied in the case 
of Lebesgue sampling. 



Recalling that {r^} denotes the sequence of sampling times 
for sensor i, we have that the local log-likelihood ratio at time 
rl can be written as 



fe=l 



(23) 



suggesting that the fusion center only needs the increments 
u l i — u l i in order to recover the exact value u l 4 at the 

sampling instant T l n . When {u\] has continuous paths and we 
adopt the Lebesgue sampling scheme then we observe that 
these increments can take only upon the two values — Aj 
or A,, since the process u\ — u l t will hit at the time of 
sampling one of the two thresholds, due to path continuity. By 
assuming that the values Aj, Aj are selected before hand and 
are made available to the fusion center, it then becomes easy 
to communicate the exact value of the increment u l < — it* < 
by simply transmitting the following 1-bit information 




= A, 
= -A, 



(24) 



The fusion center, using the sequence {z^} and d23l . can 
reproduce u\ exactly at the sampling times and then form 
u t which is required in the SPRT-like test defined in (f22j). 
Actually with this particular communication protocol it is 
possible to update directly the test statistic u t , without passing 
through the local statistics u\. Indeed the fusion center, every 
time it receives the 1-bit information z l n from sensor i, it must 
simply add to the existing u t either — Aj or Aj depending on 
z l n being or 1 respectively. This observation suggests that 
the process {u t } is piecewise constant exhibiting jumps every 
time the fusion center receives information from one or more 
sensors. 

Lebesgue sampling in conjunction with the stopping and 
decision mechanism defined in d22l gives rise to the De- 
centralized Sequential Probability Ratio Test. This is in fact 
the continuous time version of the scheme suggested in lfl2ll 
and constitutes the test that will be in the center of our 
attention. We emphasize that the D-SPRT is a valid decentral- 
ized sequential test since communication is limited to 1-bit 
data. Before examining the optimality characteristics of the 
D-SPRT, let us identify certain important properties of this 
detection structure: 

• Lebesgue sampling at each sensor can be seen as a local 
repeated SPRT with thresholds Aj,A,-. Using (O and 
( fl9] l one can also prove that 

-Aj = logp#^; * = log m^l- (25) 



p « = or 



p (4 = i) 



Consequently, for the update of the estimate u t , the fusion 
center uses the log-likelihood ratio of the received bits z l n . 
The local thresholds Aj, Aj control the average intersam- 
pling period which is an increasing function of these 
two parameters. Recalling that we have two different 
hypotheses, we understand that the average intersampling 
period will depend on the true hypothesis. If we require 
the two average periods to have specific prescribed values 
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then, using ( fT3l (or ( fl9l ) if we want to specify the K-L 
divergence) and ( fl4] i. we can uniquely identify the local 
thresholds for the Brownian motion or the Ito process 
case. In other data models, the two thresholds can be 
specified using simulations. 

From the definition of the Lebesgue sampling scheme it 



is easy to see that \u\ — ■ 
that 

\u t -u t \ <C 



< A 8 + A i: t > 0, suggesting 



K 

E 

2=1 



(A^ + A^i^O. (26) 



Thus, at any time t, the "approximate" log-likelihood 
ratio Ut differs from the "true" log-likelihood ratio u t 
at most by the constant C. 

As we argued above {u t } is piecewise constant. Assum- 
ing it is right continuous with left limits, the difference 
u t — u t - expresses the possible jump in the process at 
time t. The largest in absolute value jump occurs when 
all sensors communicate at the same time and transmit 
data of the same sign. It is easy to verify that the maximal 
jump can also be bounded by 



\u t - u t - 



(27) 



where C is defined in 
• We recall that, in addition to the data sequence {z^}, 
each sensor transmits indirectly to the fusion center the 
sequence {5 l n = T % n — t^_i} of intersampling periods. 
As we argued before, the pairs (z^,^) constitute the 
complete set of information received by the fusion center 
generating the filtration \f$t\- It is also evident that 
the statistics of (z^,(5^) differ under each hypothesis 
suggesting that both components of the pair may carry 
information about the true hypothesis. We realize how- 
ever that D-SPRT makes use only of the data {z l n } 
ignoring completely the intersampling periods {S^}. Even 
though this information dropout inflicts a performance 
loss, it turns out that it is practically advantageous. Indeed 
any efficient use of the pair (z l nl 5 l n ) would require the 
knowledge (or computation) of the corresponding joint 
pdf under the two hypotheses. Unfortunately, this is 
possible only for the Brownian motion model [17| and, 
even in this case, it is in the form of a complicated series 
expansion. 

B. Asymptotic Optimality of the D-SPRT 

Let us now establish a strong asymptotic optimality property 
for D-SPRT in continuous time. This is the goal of our next 
theorem. 

Theorem 1: Suppose that & \df is the D-SPRT test defined 
in d22l) . with thresholds A, B selected to satisfy the error 
probability constraints in <(5j with equality, then 



A < I log /? + C; £<|loga| + C. 



(28) 



Furthermore, D-SPRT is asymptotically optimum of order-2 in 
the case of Problem 1 and Problem 2 with Brownian motion 
signals with constant drifts and in the case of Problem 2 with 
Ito processes. 



Proof: To prove 
use d26l ). this yields 

(3 = Pi(<& # < -A) 



we apply a change of measures and 



= c 



sr 



l {u#<-A} 



{ug.<-A} 



< e 



-A+C 



(29) 



(31) 



which proves the first inequality in ( 1281 1. Similarly we can 
show the second inequality. 

For order-2 optimality, we are going to prove only the case 
of Ito processes and Problem 2, since this reduces to Prob- 
lem 1 in the case of Brownian motions with constant drifts. 
According to the second relation in (O, under hypothesis Ho 
we need to prove that 

(-Eo[u # ])-(-E [u^]) = 0(l). (30) 

Note that the left hand side in ( f30b is always nonnegative 
since the SPRT, by being optimum, delivers the smallest K- 
L divergence. Consequently what is left to show is that the 
difference can be upper bounded by a constant. 

Recall that {u t } is piecewise constant therefore stopping 
can occur only with a jump. According to d27l i the jumps of 
this process cannot exceed the bound C defined in d26l i. Before 
stopping, the process u t takes values in the interval (—A, B) 
consequently, after stopping, we have v,f > —A — C. Using 
this observation, (|26T > and (l28l >. we can write 

> (-A - C) - C > -| lo g) 3| - 3C. 

From dl9l ) we have that the performance of the SPRT, as 
a,[3 — > 0, satisfies —E {u^] = |log/3| +a|log/3| + o(l). 
Normally a and j3 are selected to have the same order 
of magnitude yielding cc| log /?| = o(l), however for the 
validity of our theorem we can even tolerate cases where 
a|log/3| = 0(1), that is, cases where a and f3 are of 
drastically different orders of magnitudes (e.g. (3 — c/\ loga|). 
Consequently, assuming that a and j3 converge to so that 
a|log/3| + /3|loga| = O(l), if in (|3T1 > we replace |log/3| 
with the optimal SPRT performance, this proves (l30l under 
Ho- Adopting similar arguments for the upper threshold B, 
we can prove © under Hi. This concludes the proof. ■ 

C. Simulation Experiments 

We now present a simulation experiment in the context 
of Problem 1 with continuous time observations defined as 
in ( fl2] i. Specifically, each sensor observes a standard Wiener 
process under Ho and a Brownian motion with a constant drift 
under Hi. We consider the case of K = 2 sensors with the 
two constant drifts under Hi to have the values p 1 — p? = 1. 

We compare the D-SPRT against the continuous time (cen- 
tralized) SPRT, the discrete time (centralized) SPRT and Mei's 
iTTTl decentralized test. The last two test are applied to discrete 
time data that are generated with canonical deterministic 
sampling. For the comparison to be fair, we must equate the 
average intersampling periods of the Lebesgue sampling with 
the constant period h of the canonical deterministic sampling. 
Selecting the local thresholds to have values Aj = A, = 2, 
yields Eq[t{] = Ei[r|] = 3.0464 which must also become 
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- SPRT- continuous 

- SPRT- discrete 
Mei's scheme 

- Proposed 




Type- 1 - Type- II Error 



Fig. 2. Relative performance of centralized and decentralized schemes in 
continuous time with K = 2 sensors and testing between Ho : Brownian 
motions with drift and Hi : Brownian motions with drift 1. 



the value for the period of the deterministic sampling, namely 
h = 3.0464. In Fig.|2]we can see that the distance between the 
D-SPRT and the optimal performance remains bounded, which 
agrees with the order-2 asymptotic optimality result of Theo- 
remQ] Mei's scheme on the other hand, known to be order- 1 
asymptotically optimum (see [11]), exhibits performance that 
slowly diverges from the optimum. 

The other important conclusion that we can draw from 
our graph is that the D-SPRT exhibits a distinct performance 
improvement over the discrete time SPRT which is applied 
after canonical deterministic sampling. We recall that this 
algorithm is optimum in discrete time but under the continuous 
time setup it is asymptotically optimum of order-2. As we 
argued in the Introduction, Lebesgue sampling is preferable 
to canonical deterministic sampling from a practical point 
of view since it does not require synchronization. Motivated 
by our simulations we can also conjecture that, even under 
the centralized setup, this form of sampling delivers better 
performance than canonical deterministic sampling. 

V. Decentralized Sequential Testing in Discrete 

Time 

We consider the same formulation as in Section IV only now 
time t is discrete with t G N. At each sensor i, the process 
{£1} is i.i.d. under the two hypotheses with corresponding cdfs 
Fj(x)J = 0, 1. Denoting with l\ = log(dFf(£)/di^(£)) the 
local log-likelihood ratio of the sample Q. and assuming that 
Pj(£ l t ^ 1) > 0, in other words that the two densities are not 
equal with probability 1, we have that the global log-likelihood 
ratio u t is given by 



Ui 



K 



A" 



= 1 k=l 



K 



(32) 



When this definition of u t is used in (fTTT i. the corresponding 
SPRT is optimum in the sense of Problem 1 and 2, provided 
that the two thresholds A, B are selected to satisfy the proba- 
bility constraints in (|5]l with equality. We recall that, in discrete 



time, apart the i.i.d. case, there is no other data model for 
which we know the solution for Problem 2 (i.e. there is no 
equivalent to the Ito processes case). 

The centralized SPRT will again become the point of 
reference for any decentralized test, it is therefore necessary to 
quantify its performance. Unfortunately in discrete time there 
are no exact expressions as in continuous time, we therefore 
need to resort to asymptotic formulas and bounds. For the 
performance of SPRT we have [2, Page 21] the following 
lower bounds 



-EoM >H(a,f3) = |log/3|+o(l), 
EiM >H(J3,a) = |Ioga|+o(l), 



(33) 



which replace the exact equalities of the continuous time and 
continuous path case depicted in ( fT8l >. 

Let us now introduce a very important element in our 
analysis which will allow us to connect the discrete with the 
continuous time version presented in the previous section. 
We will assume that the "size" of all local log-likelihood 
ratios can be quantified, in an order of magnitude sense, by a 
finite parameter 9. Normally 9 = 1, meaning that we regard 
the corresponding log-likelihood ratios as being of nominal 
size. Here however we would like to include an additional 
dimension into our analysis by relating the size of the log- 
likelihood ratio to the error levels a, (3. If for example the 
samples are generated by sampling a continuous time 
process, then 9 can be directly related to the sampling period. 
Our goal is to show that, for sufficiently "small" samples, D- 
SPRT enjoys the same order-2 asymptotic optimality property 
as its continuous time counterpart. The actual size 9 that can 
assure this interesting result, as we will show, decreases to 0, 
but at a much lower rate than the two error levels a, (3. This 
suggests that with small changes in 9 (coming for instance 
from a mild oversampling of a continuous time process) we 
can obtain significant performance gains. 

It is clear that our intention is to apply the same D-SPRT 
scheme we introduced in the continuous time case, namely 
Lebesgue sampling combined with an SPRT-like test where 
we approximate properly the global log-likelihood ratio ut- 
Unfortunately this transfer from the continuous to discrete 
time is not as straightforward as one might expect. The main 
reason is that with Lebesgue sampling we are no longer able 
to reproduce exactly the local log-likelihood ratios at the 
sampling times because of the overshoot effect occurring at the 
local SPRT. This rather unfortunate difference is responsible 
for a substantial complication in the corresponding discrete 
time analysis. 

The overshoot is of course directly related to the size of the 
local log-likelihood ratio of each sample. Since for our analysis 
the overshoot plays a very important role, it is more convenient 
with 9 to capture the overshoot size and then, through proper 
conditions, to examine how 9 relates to the log-likelihood ratio. 

Finally, in order to avoid unnecessary complications, we 
will limit ourselves to the case where the two error levels 
a, (3 decrease to at the same rate, meaning that the ratio 
a/ (3 is uniformly bounded away from and oo (or according 
to our definitions f3 = 0(a)). 



IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. , NO. , 2009 (SUBMITTED) 



A. Lebesgue Sampling and D-SPRT in Discrete Time 

In each sensor i, the Lebesgue sampling scheme defined 
in |@}, produces a sequence {t^} of {J^j-adapted stopping 
times, only now, due to the overshoot effect, the local SPRT 
statistic u\ — u% does not necessarily hit the two thresh- 
olds. Consequently the information sent over the channel can 
express only the side by which the statistic u % ri — i exits 

the interval (— A i , Aj), more precisely 



1 ifli ! , > A; 

T i T 7i-1 _ 

if u\ -u\ < -A,-, 



(34) 



which is the equivalent of (l24l . 

To this end it is only natural to ask how the fusion center 
should utilize the sequence {z^}. In the continuous time and 
continuous path case, we recall that the fusion center, in view 
of (l25l l. uses the log-likelihood ratio of the received bits z l n 
to update the estimate u t . Consequently, in discrete time it 
seems only natural to use the same idea and define (as was 
also originally suggested in |12|) the following two quantities 
for each sensor 

A, = lo, P M^; X (35) 



p (4 = o)' 



J [Z„ 



1) 



Both values Aj, Aj can be precomputed either by simulations 
or numerically and made known to the fusion center. 

As we argued above, we are interested in the sequence of 
overshoots {rf n }, where 



if n =(<* - tt* J ,_ i + Ai)l {u i . _„*. <_aj 



(36) 



The maximal average overshoot size is a parameter that plays 
a very important role in our analysis. We define it as 



9 = max max Ej [ \ rf n |] , 



(37) 



and we know [19] that it is finite if Ej[(£() 2 ] < oo, j = 
0,1, i = l,...,K. 

In the continuous time and continuous path case, since 
there is no overshoot, the thresholds Aj , Aj coincide with the 
quantities A^Aj. In discrete time this is no longer true. The 
next lemma quantifies their relative size. 

Lemma 1: Let A^A; > denote the thresholds for the 
local SPRT and A J7 A t be defined as in ([35]), then 



A, = A, 



0(0); 



A z > A % 

A,; = A,; 



0(6) 



(38) 
(39) 



Proof: The proof is presented in the Appendix. ■ 
The fusion center, every time it receives an information bit 
z l n updates its existing statistic Ut by either adding — Aj when 
z l n = or Aj when z % n = 1. Recalling that m\ denotes the 
number of bits transmitted by sensor i up to time t, we can 
write for the D-SPRT statistic that u t = X)i=i &\ where 



J2>i; with A* 

n=l 



-A.l^oj+Aa^-i}. (40) 



The K-L information numbers of the sequence {A^} play 
also an important role in our analysis. We have the following 
estimates depicted in the next lemma. 

Lemma 2: For the K-L information numbers of the se- 
quence {AJj} we can write 

A^e 3 * - a-.^-A, 



p 



Ai(e" 



1) 



> 



A = Ei[AjJ > 



Me- 



-A 



l) + A. 1 (e A - -1) 



(41) 



> 0. 



Additionally, if Aj,Aj — > oo in such a way that A^/Aj is 
bounded away from and oo (i.e. Aj = 9(Aj)), the previous 
expressions simplify to 



4>&i + o(l); I{>Ai + o(l). 



(42) 



Proof: The proof is presented in the Appendix. ■ 
The analysis of the classical SPRT algorithm relies on 
Wald's (second) identity. In order to be able to analyze the 
D-SPRT, it turns out that we need an equivalent result. The 
next lemma introduces a version of Wald's second identity that 
is suitable for our needs. 

Lemma 3: Let {t^} denote the sequence of sampling times 
generated by the Lebesgue sampling scheme in sensor i. 
Consider a sequence {d} of i.i.d. random variables where 
each Q is a function of the samples +1 , . . . , acquired 
by the sensor during the nth intersampling period and assume 
Ej'IlCnl] < oo. If T denotes any { jF t }-adapted stopping time 
which is a.s. finite with finite expectation and m z T is the 
number of sampling times occurred up to and including 
time T then, for j = 0, 1 we have 



c 



= E j [C 1 ](E j [mj r ] + l). 



(43) 



Proof: The proof is presented in the Appendix. ■ 
One might wonder why is it necessary to set the upper limit 
in (l43l l to mlp + 1 instead of the classical m l T we encounter 
in Wald's original identity. Unfortunately if the upper limit 
is replaced by m z T then in the proof (specifically in (l63l l) 
the random variable Q n will be combined with l{ m ^>„} 



instead of 1 



{m^>n— !}■ 



As it turns out, these two quantities 



are not necessarily independent as is the case between Q n and 
3.{m^>n-i} and therefore Wald's identity cannot be assured. 

If we change the upper limit to mlp then we can write 
two useful estimates that are an immediate consequence of 
Lemma[3] and are presented, without proof, in the next corol- 
lary. 

Corollary 1: Let {Cn},T and m % T be as in Lemma[3] then 
i). For Q n > we have 



E, 



n=l 



^[ciKE.-K^ + i). 



(44) 



ii). If {C,n} is a sequence with |£ r l J < M < oo for all n, 
then 



Ei[Cj]E. 



< 2M. 



(45) 
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Unlike in continuous time, due to the overshoot effect, 
there is now an accumulation of errors which results in the 
difference Ut — u t being unbounded and no longer limited by 
a constant. However, by properly selecting the local thresholds, 
we will see that we can force this difference grow at a 
much slower pace than each of its components Ut,u t . In turn 
this possibility will allow us to prove interesting asymptotic 
optimality properties for the D-SPRT in discrete time. Since 
the difference of the two statistics plays a crucial role in our 
analysis with the next lemma we obtain an estimate of its size. 

Lemma 4: If {r?^} is the sequence of overshoots generated 
by the the Lebesgue sampling mechanism at sensor i, then for 
any {J^ f }-adapted stopping time T we have 



Ej[|wr — ut\] < max Ej [ 1 77^ | ] 



|Ej[fi T ]| + 2C" 
min, 1\ 



K 



(46) 



where C = ££i(Ai + A,) and C = + A*)- 

Proof: The proof makes use of Corollary 1 and it is 
presented in the Appendix. ■ 

B. Asymptotic Optimality 

We have concluded the presentation of the background 
material that is necessary for establishing our main optimality 
results. Before going to the next theorem that introduces a 
key estimate for the performance of D-SPRT, we would like 
to introduce an additional quantity that expresses the order of 
magnitude of the local thresholds. We will assume that there 
exists a quantity A such that for all i we have A^ = 0(A) and 
Aj = 0(A). This is necessary, because in order to establish 
the desired asymptotic optimality property, at some point we 
will require the local thresholds to tend to infinity. With this 
assumption all local thresholds increase at the same rate. After 
this clarification we can now state out next key theorem. 

Theorem 2: Let 3T , 2? denote that stopping times for the 
centralized SPRT and D-SPRT respectively, we then have the 
following estimate for the thresholds of D-SPRT 



A<|log/3|; B<|loga|. 
Additionally, for j = 0, 1, we can write 



E [u^] - E [w^]| < 
Ei[u^] - Ei[u^]| < 



O(A) 



0(A) 



log/3| + 0(A), 
loga| + 0(A). 



(47) 



(48) 



Proof: The proof is very technical and it is presented in 
sufficient detail in the Appendix. ■ 
We note that (PETT i is the analog of ( T28l l in discrete time. In 
fact it constitutes a better approximation than (f28j) but at the 
expense of a (significantly) more involved proof. Inequality 
d48T > refers to the difference of the K-L divergences between 
the SPRT stopping time & and the D-SPRT stopping time #. 
Since we are in the i.i.d. case we know that the K-L divergence 
is proportional to the expected delay and the proportionality 
factor is simply the K-L information number. Theorem|2] will 
be the starting point for establishing our asymptotic optimality 
results. Let us continue by first attempting to recover the 
continuous time analog. 



Order-2 Asymptotic Optimality: Continuous time corre- 
sponds to "high sampling" or, in our terminology, to a size 
9 tending to 0. The question of course is what should the rate 
of convergence of 9 towards be, in order to assure the desired 
form of asymptotic optimality. 

Assuming A = 1, in other words that the local thresholds 
are of the order of a nominal constant, we realize from (f48b 
that we need 9 = 0(1/ loga|) = 0(l/\ log /3| ) to reduce the 
right hand side in d48l ) into a quantity of the order of a constant. 
In other words, as we decrease the two error probabilities a, (3 
we also need to decrease the size of the overshoot. What is 
however worth emphasizing is that the rate by which the size 
of the overshoot needs to go to is much slower than the rate 
of the error probabilities. This suggests that a small change in 
9 corresponds to a significant change in the error probabilities. 

Order-1 Asymptotic Optimality: Of course the most crucial 
question is what happens if 9 is considered nominal and we 
are allowed to play with the size A of the local thresholds. 
It is clear that in this case overly small local thresholds will 
induce frequent communication with the fusion center thus 
resulting in rapid error accumulation due to the overshoot 
effect. If we go through the proof of Theorem[2] we realize 
that this part is captured by the first term in the right hand 
side of (|48V If on the other hand we use overly large local 
thresholds then this will generate long detection delays due 
to infrequent communication with the fusion center and to 
coarse time resolution. This part is captured by the second 
term in (|48V Clearly there is a compromising value for the 
local threshold size A that can optimize the performance of 
the test. 

Attempting to discover the best threshold, consider the ratio 



< 



E^-E^] 



< 



-j[usr\\ 
0(A) 



(49) 



0(A) |Ioga| 
If we set 9 — 1 and let A — > 00 but at a rate such 
that A / 1 log a | — > then the right hand side of d49l tends 
to establishing order-1 asymptotic optimality. After some 
simple reasoning we can deduce that the best choice is 
A = 0(\/| log a |) which equates the two terms in d49l . 
yielding 



1 



(50) 



yi lo s a i j 

The optimal value we obtained for A is the optimum local 
threshold size, expressed in an "order of magnitude" form. 
Observe also that the convergence rate to of the right hand 
side in the previous expression is of the same order as the one 
obtained in ifTTl . 

If we are now allowed to play with both, the size 9 of 
the overshoot but also the local threshold size A, then our 
previous result can be ameliorated significantly. Indeed from 
d49"t we can see that the optimum size for A is now A = 
®{y/6\ log a |) which yields 



< 



E^-E,-^] 



< O 



I log a I 



(51) 
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Selecting to tend to as a function of the error probability 
a makes the right hand side of the previous expression to tend 
to zero faster than 1 / yA Iogo[. 

This theoretical result has a very useful practical implica- 
tion. Specifically, we deduce that by selecting samples which 
generate smaller sized overshoots, results in a D-SPRT perfor- 
mance improvement. As we will see in our simulations, this 
important characteristic is not enjoyed by Mei's decentralized 
scheme |[TT1. 



C. Relating the Log-likelihood ratio to the Overshoot 

Before going to our simulations let us find a way to relate 
the overshoot size Ej[|?7*J] to the log-likelihood ratio l\ of a 
sample. Since our processes are stationary, we will consider 
only the case n = 1. Recall that Tq = tf, = 0, therefore 



rf=inf{t>0:t4^(-A i ,A i )} 



«« + AJl^ <_ A } + «, - A,)l {ui . >Si} 



(52) 



Note now that we can write t\ = min{r^ , t\ } where 



t\ = inf{f > : u\ < -A J; f \ = inf{t > 
Using these definitions the overshoot takes the form 

from which we can easily deduce that 



u\ > AJ. 
(53) 



(54) 



ZjM\\ < E [-(uL + AJ] + E X [«L - AJ. (55) 



From [19, Theorem 3] we have for r > 1 that 

h2E [inr +l ] 



sup E [-« + AJ] < 
a, >o - 1 



_sup Ei[tili 

Ai>0 



A,] < 



1 |Eo[4]l 
2 E 1 Q4l r+1 ] 
1 |Ei[4]| 



l/r 



1/r 



(56) 



where we have used the fact that for a nonnegative random 
variable x and any r > 1 we have E[x] < (E[x r ]) 1 ' r . 

We would like to point out that ( |56T ) with r = 1 is the most 
common selection for fabricating bounds for the overshoot (see 
(21). Unfortunately this value does not always produce upper 
bounds that tend to when the corresponding log-likelihood 
size tends to 0. This is the reason why we had to resort to this 
more general form of upper bound. 

D. Simulation Experiments 

We illustrate our ideas by performing a simulation exper- 
iment with K — 2 sensors, each one observing a Brownian 
motion. The hypothesis testing problem we would like to solve 
is in the context of the problem defined in (fT2l . that is, under 
Ho we have a standard Wiener process in each sensor while 
under H i a Brownian motion with constant drift [i % . We select 
the two drifts to be equal to 1, that is, /i 1 = fi 2 = 1. 

The continuous time processes are sampled using canon- 
ical deterministic sampling with a sampling period h, thus 
generating the discrete time sequence of Normally distributed 
samples {Q} in each sensor. Clearly under Hq we have that 




Type- 1 = Type- II Error 



(b) 



Fig. 3. Relative performance of centralized and decentralized tests in discrete 
time with K = 2 sensors and testing between Ho : Normal A/^O, h) and Hi : 
Normal Af(h, h) random variables with (a) h = 1.0 and (b) h = 0.1. 



£| ~ jV(0, h) whereas £| ~ J\f(h, h) under the alternative 
hypothesis Hi. 

The size of our samples is a function of the sampling period 
h and tends to as h — > 0. Let us use (|56T > to verify that the 
overshoot tends to as well. Forming the log-likelihood ratio 
we find l\ — —0.5h + £l and computing the upper bound in 
(l56i l for r = 1 yields 3(1 + 0.25/i) which, clearly, does not 
converge to when h — > 0. If however we select r = 2 then 
the upper bound turns out to be 0(/i 1 / 4 ) which tends to 
with h. Consequently /i 1 / 4 can play the role of 9. 

We compare the discrete time D-SPRT with the optimal 
discrete time SPRT and also with the test suggested by 
Mei in ifTD . which is asymptotically optimal of order- 1. To 
confirm the close connection of the D-SPRT to the size of 
the samples (or the overshoot), we have selected two values 
for the sampling period, namely h — 1 and 0.1. For the 
local thresholds we also considered two values, specifically 
Ai = Aj = A = 1 and 2. 

Fig.[3]depicts the K-L divergence of the competing schemes. 
We recall that in this case the K-L divergence is proportional 
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to the expected detection delay. The reason that we decided 
to present the former measure instead of the latter is because 
the K-L divergence is independent of the size of the samples 
while the detection delay varies drastically with this quantity 
(smaller samples tend to need more time to reach the same 
threshold). 

We observe that D-SPRT exhibits a notable performance 
improvement when we go from the value h = 1 to h = 0.1. 
This is in complete accordance with our previous analysis 
since h = 0.1 generates likelihood ratios and overshoots of 
smaller size than h = 1. The optimum SPRT on the other 
hand and Mei's scheme are relatively insensitive to this change 
of size in the samples. For D-SPRT, it is basically the error 
accumulation expressed though the difference \ut — v>t\ that 
improves as we use smaller h, incurring an overall perfor- 
mance improvement. What is also worth emphasizing for the 
D-SPRT is that the communication frequency (expressed in 
continuous time) between the sensors and the fusion center 
stays relatively unchanged under both values of h while in 
the other two schemes it increases by a factor of 10. 

Finally, in Fig. [3] we can also observe that the performance 
of the D-SPRT, as a function of the local threshold value 
Aj = Aj = A, is not monotone. Indeed, case A = 2 is better 
than A = 1 for smaller values of a. Additionally, the error 
probability values where A = 2 prevails are increasing with 
the size of the samples. This performance can be explained 
by our analysis. We recall that the optimum local threshold 
is ( y/9\ log a | ) suggesting that the error probability where 
any specific A is optimum is roughly a = 9(exp(— A 2 /9)). 
Consequently, a larger threshold delivers better performance 
at a smaller error probability and this value is an increasing 
function of the size 8 of the samples. 

VI. Conclusion 

We have presented and rigorously analyzed a decentralized 
scheme for sequential hypothesis testing. The detection struc- 
ture relies on a local SPRT implemented at each sensor which 
is used for random sampling of the observed data stream. This 
sampling scheme naturally induces a 1-bit communication 
protocol between the sensors and the fusion center which is 
asynchronous, a very practically desirable characteristic. By 
performing a detailed analysis we were able to prove interest- 
ing asymptotic optimality properties for the proposed test and 
reveal its ability to improve performance when oversampling is 
used at the sensor level. Overall, our decentralized detection 
method exhibits performance that can be very close to the 
optimum centralized test, outperforming other decentralized 
tests of the literature. 

Appendix 

Proof of Lemma\J] To prove the lemma note that 

Pi(4 = o) 



p (4 = o) 

Since 



U i —U i 



< -A, 



U 4 —U ■ 

E [e T " <-i\uU -ul 



< -AJ < e 



(57) 



(58) 



this proves d38l ). For d39l ), using Jensen's inequality in ( 1571 ), 
we can write 

Eofe 1 *-" 1 ^-! |<i - uj.,_ i < -AJ > e-^ie" 9 (59) 
where 

9 = E [-(<, - < ; i + AJK, - i < -AJ 

E [-«* +^)Mu U - K% <-AJ}] 

= Po«, i <-AJ 

E [-«, - +A i )l { ,^_ << <_^ ]} ] (60) 

~ l-Po«i -<< i > A,) 

8 

< — . 



1 - e~ A - 

where in the last inequality we used the fact that the numerator 
is an overshoot and therefore bounded by 8 and in the 
denominator we used Wald's approximation (which provides 
an upper bound) for the error probability of the local SPRT 
exiting from the wrong side. Replacing the bound for 9 in 
d59l ), taking the logarithm and recalling d38l ) we conclude 



< A,- - A„ < 



1 - e- A * 



(61) 



Assuming that Aj is bounded away from 0, the previous right 
hand side becomes 0(9) and proves the lemma. ■ 
Proof of Lemma]2\ Let us prove the first inequality in (HTI) . 
Note that 



- -Eo[AJJ _ 



A l (e Ai - 1) + Aj(e -> - 1) 



> 0. (62) 



By direct differentiation we can verify that the function 
K(x,y) — {x(e v — 1) + y{e~ x — l)}/(e v — e~ x ) is mono- 
tonically increasing in both its arguments, when x,y > 0. 
Consequently from d38l , namely that Aj , Aj exceed Aj , Aj re- 
spectively, we immediately deduce the final inequality. Proving 
d42l is straightforward. ■ 
Proof of Lemma\3\ For simplicity we drop the subscript j 
that refers to the true hypothesis. We observe that 



E 

n=l 



S n 



E 

,n=l 



Cnl{ m ^>„-1} 



(63) 



Note that {m\, > n - 1} — {T > r^_ x }. By recalling that 
T % n _ x is an {j^}-adapted stopping time, this suggests that it 
is also {J£" t }-adapted. Because of the latter observation we 
can assess that the event {T > r„_i} is ^2-measurable 
(since {T > r} is ^ r _i-measurable, this being true even 
if t is an {J^}-adapted stopping time). Consequently 
is independent of l/ m » > n _i}. Interchanging summation and 
expectation and using independence in d63l . we immediately 
obtain the desired equality. 

The careful reader will of course argue that we can- 
not interchange summation and integration so freely. In- 
deed this is absolutely true. We can however write = 
max{C^,0} — max{— C^,0} and for each component the 
interchange is possible requiring only E[max{(^, 0}] < oo 
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and E[max{— ( l n , 0}] < oo, which of course is satisfied iff 
E[|C^|] < oo, for the lemma to be true. ■ 
Proof of Lemma$\ To prove d46b note that \ut — Ut\ < 



|. Using d40b we observe that we can write 



k - ui\ < \ut - < m< i + IK* - <i (64) 

From the definition of the Lebesgue sampling we have \u\ — 
ul I < A, + Aj. Now note that if u l , — u l , exits from 

t . ™. . 

the lower end then | [u* < — u l ri ] — X l n \ = \ [u l ri — u l ri ] + 
Ail < \[ u ri ~ ul T i ] + Ai|> with the last inequality coming 
from ([38). Similarly if u % ri — u % 4 exits from the upper end 
then - u\ ] - AfJ < - u\ ] - AA In both 
cases we see that \\u l i — u l i ] — Ail < I, with 77* the 
overshoot defined in d36t . Consequently we can further upper 
bound ( f64T > using the overshoot. Replacing t with T then taking 
expectation and using d44l i from Corollary[T] we obtain 



< A,- 



E[|<|](E[r 



1) 



<A i + A I +maxE[|^|](E[m^] + l) 
Summing over i yields 



(65) 



Let us denote with (z, S) the joint pdf of the pair (z^, 5^) 
where, as usual, j = 0, 1 refers to the true hypothesis. We 
recall that z G {0, 1} since z % n is a 1-bit information. We can 
now write the joint pdf as 

p)(z,S) = 7r](ft)g*(6\0)l {z=0} + (1 - ^(0)) ff }(<5|l)l {2=1} , 

(69) 

where TTj(z) = Pj{z l n — z) is the probability that sensor 
i transmits the bit z % n = z under hypothesis Hj. Similarly 
g l j(S\z) is the pdf of 8 l n at sensor i given that z % n = z under 
hypothesis Hj. For example g l j(6\0) denotes the pdf of the 
intersampling period given that the local SPRT exits from the 
lower end. The marginal pdf of the intersampling periods 8 l n 
is simply 

g){5) = tt}(0)4(5|0) + (1 - tt}(0))^(*|1). (70) 

Suppose now that we are at time t and that the fusion center 
observes ml — k data pairs coming from sensor i. We have 
that{mj =k} = {6{ + --- + 6 l k < t < 5\ + - ■ - + <^ + <^ +1 } = 
{0 < t - r l k < Si +1 } where t\ = S\ + ■ ■ ■ + tfj.. Let us 
now define the likelihood of the following event: "up to time 
t, the fusion center observes the following m\ — k pairs 
(z\,Sl), . . . , (zl, S l k y. Using the independence of the pairs 
across time, we can write 



E[\u T -u T \] <maxE[|? ? ;|] 
Using now (l40l we can write 




C. (66) 



E [utI 



P j (mi = k;(ziS i 1 ),...,(zi,Sl)) 

= P j (0<t-4<6i +l] (z{,6i),...,(zi,5i)) 

= [1 - G)(t - 4)] (jlp)(4X^j Hri<t } , 



(71) 



> -Eo[A I „]E [my-2(A. 1 + A l 



(67) 



where G* (<5) = f Q g l j(x)dx is the cdf of 8 l n and g l j(S) is the 
marginal pdf defined in d70b . 

The previous likelihood can be decomposed as follows 



where for the last inequality we used ( |45T > of Corollary[T] and / fc 

the fact that |A^| < A. + A,. Since by definition P = -E [A*J Pj{m\ = k; (zl,S[), (4,4)) = ( ]J tt}(4) 
is the K-L information number for the random sequence {A^} 
we strengthen the inequality by minimizing over i. Summing 
the result over i yields 



A' 



(V - G){t rl)] ft 9 }(4|4)l K < t} ^ . 



(72) 



E [ut] > (min Jj) V B [m l T } - 2C' 



(68) 



Solving for the sum and replacing in d66l > yields the desired 
inequality under H . Similar proof applies under Wi. ■ 

Proof of Theorem^ The proof of this theorem is very 
challenging. In fact, as we will see, the most important part is 
demonstrating the validity of the estimates in d47l >. We recall 
that in the synchronous case, at each time instant i, we have 
information arriving at the fusion center from all sensors. This 
scenario can be easily described through i.i.d. statistics across 
time. Here however, due to the asynchronous communication, 
this is no longer as straightforward. 

In order to solve this problem, let us concentrate on one 
sensor (say i). We know that this sensor sends the sequence of 
bits {z,\} to the fusion center but also, indirectly, the sequence 
of intersampling periods {5 l n = T l n — T l n _^\. The sequence 
of pairs {(z^,^)} is adequate to fully describe sensor's i 
transmission activity to the fusion center. Note that these pairs 
are i.i.d. across time and independent across sensors. 



The first part is the likelihood of the 1-bit data {z\, . ■-,z l k } 
and the second the likelihood of the intersampling periods 
{S\, . . . , S l k } conditioned on the 1-bit data {z\, . . . , z l k }. 

If Sf t l denotes the cr-algebra generated by the pairs 
{(z^,<5^)} received up to time t, then the likelihood ratio 
between the two probability measures for sensor i can be 
written as 



1 9i(t,s\,...,si n ^\zi,...,zi nt ) 



e * x 



) 9l(t,Sl...,6l 4 \zl,...,zl ni ) 
gl(t,6l...,5l nl \zl,...,zl nl ) 
gi(t,6i,...,5 i mi \zi---,zLy 



where 



[i-q(t-T-)]n 3 *(^K)i w < t} 



(73) 



(74) 



n=l 
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expresses the likelihood of the intersampling periods condi- 
tioned on the 1-bit data, under hypothesis Hj. Combining 
all sensors and using their independence, we end up with 
the following likelihood ratio that refers to the complete 
information ^ t received by the fusion center until time t 

dPi 

with Jz? t denoting the likelihood ratio of the intersampling 
periods conditioned on the 1-bit data, namely 



= e ut x jSf t 



(75) 



' mr 1 J- 1 1 mr. ' 



(76) 



We are now in a position to prove (l47b . Consider the first 
inequality. We have 



/ 8 = Pi(d^ = 0) = Ei[l { fi <_ 1} ] 



= E [e 5 -* x&fl 



The last equality is true because 



ly ] < e- A E [if # ] = e 



(77) 



E [^f f \ — E 
= E 



i\ z \,... 



i z i > ■ 



7 ^ 



. *1 > • • • ) z m if 



= 1. 

(78) 



This proves the first inequality. The second can be proven in 
an analogous way. 

To prove the second part of the theorem, namely d48l . again 
we consider the inequality under Hp. Note that 



E [u^] > E [z2^] - E [|w_^ 



(79) 



Using ( l46b from Lemma|4] the inequality becomes 

E [u#] > (l + 4>)E [u # ]-C-2$C7 / - J pL-maxE [|r ? ;|], (80) 

where $ = (max; Eo[|^|])/(minj/g). 

As in the continuous time case, we have > —A — C 
and using (|47| i we can write u^. > — | log/3 — C" which also 
implies Eo[u^] > — | log/3 — C". Replacing the latter in (|80b 
results in 



Eo[u#] + | log/3| 

> -$|log/3| -(l + 3$)C"-C7- J ft:maxE [|r/;| 



(81) 



If we replace, in the left hand side of the previous inequality, 
| log/? | with the optimum performance — Eo[u^], because of 
(|33"T >. we strengthen the inequality obtaining 

(-E [ U# ])-(-EoM) 

< $| log/3| + (1 + 3$)C + C + K max Eo[|»?* |] + o(l). 

(82) 

Note now that C = 9(A) and for the overshoot we have 
maxi Eo[|?7*J] < 0. In our analysis we consider A to be, 
either of the order of a constant or to tend to infinity and 
9 to be either of the order of a constant or to tend to 0. 
Because of this assumption and Lemma[T]we have A i; A, that 
are 9(A) meaning that C = 9(A). Because of Lemma[2] we 



conclude that mirij 7q > 9(A), consequently $ < 6>/9(A). 
Substituting these order of magnitudes in d82l yields 



(83) 



-E [u^]) - (-EoKr]) 

= 0^y|log/3| + 9(A) + 0(0) + o(l). 



Finally due to the relative size of A and we can also conclude 
that 9(A) + 0(9) + o(l) = 9(A) which proves the desired 
version of the inequality. Similar steps can be applied to prove 
the theorem under hypothesis Hi. ■ 
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